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Abstract 

The orchid family Orchidaceae is one of the largest angiosperm families, including many species of important economic 
value. While chloroplast genomes are very informative for systematics and species identification, there is very limited 
information available on chloroplast genomes in the Orchidaceae. Here, we report the complete chloroplast genomes of the 
medicinal plant Dendrobium officinale and the ornamental orchid Cypripedium macranthos, demonstrating their gene 
content and order and potential RNA editing sites. The chloroplast genomes of the above two species and five known 
photosynthetic orchids showed similarities in structure as well as gene order and content, but differences in the 
organization of the inverted repeat/small single-copy junction and ndh genes. The organization of the inverted repeat/small 
single-copy junctions in the chloroplast genomes of these orchids was classified into four types; we propose that inverted 
repeats flanking the small single-copy region underwent expansion or contraction among Orchidaceae. The AT-rich regions 
of the ycf^ gene in orchids could be linked to the recombination of inverted repeat/small single-copy junctions. Relative 
species in orchids displayed similar patterns of variation in ndli gene contents. Furthermore, fifteen highly divergent 
protein-coding genes were identified, which are useful for phylogenetic analyses in orchids. To test the efficiency of these 
genes serving as markers in phylogenetic analyses, coding regions of four genes (accD, ccsA, matK, and ycf^) were used as a 
case study to construct phylogenetic trees in the subfamily Epidendroideae. High support was obtained for placement of 
previously unlocated subtribes Collabiinae and Dendrobiinae in the subfamily Epidendroideae. Our findings expand 
understanding of the diversity of orchid chloroplast genomes and provide a reference for study of the molecular 
systematics of this family. 
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introduction 

The orchid family Orchidaceae is one of the two largest families 
of flowering plants, with over 25,000 species [1] and five 
recognized subfamihes (Apostasioideae, Cypripedioideae, Epiden- 
droideae, Orchidoideae, and VaniUoideae) [2]. A large number of 
orchids have significant economic value [3]. For example, some 
cultivars have been used as cut flowers or potted plants, whfle 
others can be utilized as food or medicine because of their 
nutritious or medical efficacy. Overexploitation and habitat 
destruction have threatened the survival of many wild orchid 
species. At the same time, numerous cultivated varieties and 
crossbreeds have been developed worldwide. Therefore, molecular 
information on orchids is of interest not only for the study of 
systematics, but also for species conservation and flower cultiva- 
tion. 

Epidendroideae is the largest of the five orchid subfamilies and 
includes approximately 20,000 species [1,2]. Several perspectives 



on its classification have long been debated [2,4-16]. Burns- 
Balogh and Funk (1986) have reviewed previous morphological 
classification systems of Orchidaceae. Freudenstein and Rasmus- 
sen (1999) pointed out that most of the previously established 
classifications have a highly developed tribal and subtribal 
classification within the Epidendroideae; they first performed a 
cladistic analysis of Orchidaceae. However, major groups of 
genera are equivalent to subfamflial groups, and the detailed 
classifications at the tribal level are not well supported by 
morphological and anatomical features in most cases [12,16]. In 
recent years, molecular data has been used in phylogenetic studies, 
but some relationships among subtribes or tribes remain 
questionable. Major disputes were focused on whether some tribes 
or subtribes were monophyletic, polyphyletic, or paraphyletic; 
which tribe or subtribe was the most basal; and the locations of 
AgrostophyUinae, Collabiinae and Dendrobiinae. Limited sam- 
pling with few variable loci in most of the common DNA regions 
has impeded reasonable and robust estimates of phylogenetic 
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patterns. Recent comparative chloroplast (cp) genomics has 
provided large quantities of data that are useful for selecting 
pertinent markers to resolve obscure phylogenetic relationships in 
seed plants [17-21]. However, cp genome information is stiU 
limited for the Epidendroideae. 

In most land plants, the cp genome is a single circular molecule 
of 120-220 kb that consists of one large single-copy (LSC) region, 
one small single-copy (SSC) region, and a pair of inverted repeats 
(IRs). Although gene organization and content are conserved in cp 
genomes of higher plants, their genome sizes are diverse and 
depend largely on the extent of gene duphcation, small repeats, 
and the size of intergenic spacers [22]. The information on 
sequence insertion or deletion, transition or transversion, and 
nucleotide repeats may help to clarify evolutionary relationships 
[23-27]. To date, cp genomes from seven orchid genera 
{Corallorhiza, Opmhidium, Erycina, Neottia, Oncidium, Phalaenopsis, and 
Rhizanthella) have been sequenced. The former six genera belong to 
the subfamily Epidendroideae, whereas the last one falls into the 
subfamily Orchidoideae. All species in these seven genera are 
photosynthetic orchids except Rhizanthella gardneri, Corallorhiza 
striata, and .Neottia nidus-avis being nonphotosynthetic plants 
[17,28-34]. However, the cp genome of the Dendrobiinae, the 
largest and most economically important subtribe in the Epiden- 
droideae, has not yet been sequenced. 

Dendrobium officinale Kimura et Migo, a perennial epiphytic herb 
of the Dendrobiinae, is endemic in moderately damp mountains in 
China [35]. The stems of D. officinale have been widely used as a 
traditional Chinese medicine (TCM) called "Tiepi Fendou." The 
efficacious compounds in D. officinale include phenols, alkaloids, 
coumarins, and polysaccharides [36]; and its medical benefits 
include stimulation of saliva, improvement in eyesight, warming of 
the stomach, enhancement of immunity, and inhibition of tumor 
growth [36]. As a result of its habitat shrinking and human 
overexploitation, natural populations of D. officinale have been 
progressively destroyed, and in 1992 it was classified as an 
endangered species in the Chinese Plant Red Book [37]. D. 
officinale has recentiy been rescued by tissue culture in southern 
China. 

The subfamily Cypripedioideae comprises approximately 155 
species in five genera [2] . All Cypripedioideae species have special 
flowers with a saccate lip, two fertile stamens, a shield-like 
staminode, and a synsepal composed of fused lateral sepals [38]. 
Because of its attractive morphological characteristics, this 
subfamily has been investigated widely in theoretical and apphed 
research. Nonetheless, molecular information on this subfamily is 
still limited. Cypripedium macranthos Sw. is a terrestrial herbaceous 
plant in the subfamily and naturally distributed in East Asia [39]. 
Because of the commercial value of its pretty red or pink flowers, it 
has been cultivated as a potted and garden plant. 

In this study, we sequenced the complete cp genomes of D. 
officinale and C. macranthos using a next-generation sequencing 
(NGS) approach. Our objectives were to deepen understanding of 
the structural diversity of orchid cp g(;n()m(;s and to provide 
information for resolving uncertain relationships within the 
Epidendroideae. The cp genomes of seven photosynthetic orchid 
species (C. macranthos, Cymbidium mannii, D. officinale, Erycina pusilla, 
Oncidium Gower Ramsey, Phalaenopsis aphrodite, and Phalaenopsis 
equestris) were compared to c'lucidate the diversity of gene order, 
gene content, and genome structure among them. Four regions 
were filtered according to the sequence divergence of protein- 
coding genes, and 56 taxa from 36 genera were used as a case 
study to determine phylogenetic relationships within the Epiden- 
droideae. 



Materials and Methods 

Chloroplast DNA extraction and genome sequencing, 
assembly, and PCR-based validation 

This study was approved by the Ethics Committee of Forestry 
Bureau of Zhejiang Province and Nanjing Normal University, 
China. We collected seeds of D. officinale from Yandang 
experimental base of Zhejiang Branch, College of Life Sciences, 
Nanjing Normal University. 

Young leaves of D. officinale were taken from 6-month-old 
seedlings grown in a greenhouse. Intact chloroplasts were isolated 
using the Percoll gradient method (22^5%) [40]. Purified 
chloroplast DNA was extracted according to the 2x CTAB 
protocol [41]. Fresh leaves of C. macranthos were collected from 
Yunnan Province, China. Total DNA was extracted using a 
Qiagen DNeasy plant mini kit (Qiagen, Germany). DNA 
concentration and quality were determined using a NanoDrop 
8000 Spectrophotometer (Thermo Scientific, Wilmington, DE). 
High quality DNA (concentration >300 ng/jil, A260/280 
ratio = 1.8-2.0 and A260/230 ratio>1.7) was used for sequencing. 

Purified DNA was fragmentcxl and usf d to construct sh<)rt-ins(;rt 
libraries (insert size ~ 500 bp) according to the manufacturer's 
instructions (Illumina). The short fragments were sequenced using 
an Illumina Hiseq 2000 sequencing system [42]. 

The raw reads for D. officinale were trimmed with error 
probabilit)' <().()() 1 and assembled using SOAPdenovo version 
1.05 with default parameters [43]. The de Bruijn graph approach 
was applied to assembly with an optimal K-mer size of 79. The 
contigs shorter than 200 bp were removed. Then the paired-end 
information was used to join the contigs into scaffolds with the cp 
genome of P. aphrodite (Accession Number: AY9 16449) as a 
reference. Gaps among scaffolds were filled using paired-end 
extracted reads. 

The short reads for C. macranthos were trimmed with error 
probability <0.05 and assembled using CLC Genomic Work- 
bench 6.0.1 (CLC Bio, Aarhus, Denmark). The contigs shorter 
than 200 bp were discarded; others were compared with plant cp 
genomes in the National Center for Biotechnology Information 
(NCBI) using BLAST (http://blast.ncbi.nlm.nih.gov) searches. 
Contigs matching referenced genomes with i? values <10~^ were 
selected for annotation. 

Based on the reference genomes in Orchidaceae [28,29], gaps 
and four junction regions between LSC /SSC and IRs were 
confirmed by PGR amplification and Sanger sequencing using the 
primers listed in Table SI. 

Genome annotation 

Protein-coding and ribosomal RNA genes were annotated using 
DOGMA (http://dogma.ccbb.utexas.edu/) [44]. The boundaries 
of each annotated gene were manually determined by comparison 
with orthologous genes from other orchid cp genomes. Genes of 
tRNA were predicted using tRNAscan (http:/ /lowelab. ucsc.edu/ 
tRNAscan-SE) [45] and ARAGORN version 1.2 (http://130.235. 
46.10/ARAGORN/) [46]. The circular genome maps were 
drawn using Genome Vx, followed by manual modification [47]. 
The sequencing data and gene aimotation were submitted to 
GenBank with accession numbers KC771275 and KF925434. 

Analyses of RNA editing sites 

Thirty protein-coding genes of D. officinale and C. macranthos. cp 
genomes were used to predict potential RNA editing sites using the 
online program Predictive RNA Editor for Plants (PREP) suite 
(http://prep.unl.edu/) [48] with a cutoff value of 0.8. 
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Phylogenomic analyses 

Sixty-three common protein-coding genes were extracted from 
10 cp genomes. Seven photosynthetic orchid species were involved 
in analyses with Calamus carjotoides, Phoenix dactylifera, and Tjpha 
Mifolia as outgroups. The GenBank accession numbers of all taxa 
are shown in Table S2. The accD, inJK, rpsl6, rpsl9,ycfl, and mffi 
genes were not included in the data set because they were 
pseudogenized in some cp genomes. Alignments were performed 
using the MUSCLE program in Mega 5.0.3 [49], without 
including gaps, and start and stop codons. The ahgned sequences 
were concatenated and used for phylogenetic reconstruction. 

The ML tree was constructed by means of GTR+G model with 
raxmlGUI version 1.2 (http://sourceforge.net/projects/raxmlgui/) 
[50] and a rapid bootstrap value of 1,000. A Bayesian inference (BI) 
tree was constructed using CAT model with PhyloBayes version 3.2 
[51]. Two Independent MCMC chains were run. The first 25% of 
the cycles were removed as burn-in, and convergence of three 
chains was checked on the basis of maxdiff <0.3 by following the 
PhyloBayes manual. 

Sequence divergence of protein-coding genes 

To obtain suitable markers for phylogenetic analysis within 
subfamilies, complete cp genomes of six orchid species (C. 
macranthos, C. mannii, D. ojficinak, E. pusilla, 0. Gower Ramsey, 
and P. aphrodite) were applied. The average pairwise distances of 
nucleotide and protein substitutions for 68 protein-coding genes 
were estimated using Kimura's two-parameter model and p- 
distance, respectively, in Mega 5.03 [49]. 

Phylogenetic application of cp genomes, a case study on 
the Epidendroideae 

We selected the Epidendroideae as an example for phylogenetic 
analysis. Data sets for four incomplete gene sequences {ycfi , maiK, 
ccsA, and accD) were obtained for 56 taxa from 36 genera. The 
data matrix included 11 subtribes and one tribe in the 
Epidendroideae, with C. caryotoides, P. dactylifera, and T. latifolia as 
outgroups. Six taxa from two additional orchid subfamilies were 
used as internal checks. Sequences from 10 of the taxa were 
extracted from complete cp genomes; sequences from the other 46 
taxa were obtained by PGR amplification and sequencing of PGR 
products widi an ABI PRISM 3730XL DNA analyzer (Applied 
Biosystems). Primers for accD and ccsA were designed using Primer 
Premier version 6 [52] based on homologous serjuences from 
orchid cp genomes (Table S3). All newly generated sequences were 
deposited in GenBank with accession numbers KF361524- 
KF361707. Sources of species and GenBank accession numbers 
are indicated in Table S4. These regions were aligned separately 
using Mega 5.03 [49] with manual modifications, and gaps were 
coded as Sequence information was analyzed using Mega 
5.03 and DnaSP version 5.0 [53]. The comljincd matrix was 
utilized for phylogenetic analyses. Modeltest version 3.7 [54] was 
employed to select the best nucleotide substitution model under 
the Akaike Information Griterion (AIG); the GTR+I+G model was 
chosen as the best fit for our data set. The ML and BI analyses 
were performed according to the same protocol as that used for 
phylogenomic analyses. 

Results 

Sequencing and genome assembly 

The raw lUumina paired-end sequencing of D. officinale 
produced 350 Mb of data. After quahty trim, 210 Mb of data 
remained with an average read length of 80 bp. The subsequent de 
novo assembly produced 13 scaffolds, 12 of which were >2 Kb and 



the scaffold N50 size was 84,55 1 bp. The average coverage depth 
was 1,400 X. These scaffolds were used for the following assembly. 
We sequenced 2.5 Gb of lUumina paired-end reads for C. 

macrantho.s (average read length of 90 bp). The initial assembly 
included 12,148 contigs. After compared with plant cp genomes, 
41 contigs were obtained with E values< 10~' and mean coverage 
depth =26x. Four of these contigs were larger than 10 kb with 
average depth coverage 129x, resulting in a nearly complete draft 
genome. After assembly and gap closure, two complete chloroplast 
genomes were obtained. 

Characteristics of the chloroplast genomes of 
Dendrobium officinale and Cypripedium macranthos 

The complete cp genomes of D. officinale and C. macranthos were 
circular, having 152,221 and 157,050 bp, respectively. Similar to 
other angiosperms, both cp genomes were AT-rich (62.53% and 
62.17%, respectively). The D. officinale plastome contained 110 
diSerent genes, of which 91 were single-copy genes and 19 were 
duplicated genes (Fig. 1). Its cp genome consisted of 76 protein- 
coding genes, 4 rRNA genes, and 30 tRNA genes. C. macranthos 
encoded 1 1 3 different genes (94 single-copy and 1 9 duplicated 
genes). The C. macranthos cp genome included 79 protein-coding 
genes, four rRNA genes, and 30 tRNA genes (Fig. 2). The gene 
content of the D. officinale cp genome was relatively conserved 
compared with other known orchid cp genomes. The gene content 
of the C. macranthos cp genome was also relatively conserved with 
the exception of the following. A coding sequence (GDS) of ire/A 
(coding for translation initiation factor) was interrupted because of 
a 5-bp deletion (53 bp downstream of the start codon). This gene 
was lost from Discorea in monocots [55]. At the N terminus of )^,(19, 
a surplus nucleotide A in the poly (A) tract interrupted the open 
reading frame (ORF), causing a frameshift. Furthermore, we 
recognized rpslQ ?is a pseudogene because a partial intron and the 
second exon were missing in it. 

Potential RNA editing sites 

In the present study, potential RNA editing sites were predicted 
for 30 genes; as a result, a total of 51 RNA editing sites were 
identified in genes of Cypridium and Dendrobium (Table S5). No 
potential editing sites were identified in seven genes {petD, pelG, 
peth, psbB, psbJL, psbh, and rpl2S) in both cp genomes. Of the 51 
editing sites, 9 (17.6%) and 42 (82.4%) were located at the first and 
the second codon position, respectively, in Cypripedium; 8 (15.7%) 
and 43 (84.3%) were located at the first codon and the second 
codon position, respectively, in Dendrobium; but no editing sites 
were found at the third codon position. Just as in other terrestrial 
plants, the editing types in Cypripedium and Dendrobium were all C- 
to-U [56-58]. The amino acid conversion S to L occurred most 
frequentiy, while P to S and R to C occurred least. Thirty-four 
common RNA editing sites were shared in genes of the two 
species. We also observed RNA editing (C to U conversion) in the 
initiation codon of rpl2 transcripts of D. officinale, which is a 
common phenomenon among angiosperms and has been verified 
in P. aphrodite and R. gardneri [28,30]. 

Phylogenomic analyses of the seven orchids 

Our phylogenomic construction was based on 63 protein- 
coding genes of cp genomes, and the aligned data set comprised 
47,736 bp. The BI and ML trees had the same topology (Fig. 3), 
demonstrating that Cypripedium (Cypripedioideae) was sister to the 

Epidendroideae. In the Epidendroideae. Dendrobium was sister to 
other species, and Cymbidium and Oncidium-Eryana were sister to 
Phahenopsis. 
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Figure 1 . Map of the chloroplast genome of Dendrobium officinale. Thick lines indicate inverted repeats (IRs). Genes shown inside the circle are 

transcribed clockwise, and those outside the circle are transcribed counterclockwise. 

doi:10.1371/journal.pone.0099016.g001 
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Figure 2. Map of the chloroplast genome of Cypripedium macranthos. Thick lines indicate inverted repeats (IRs). Genes shown inside the circle 

are transcribed clockwise, and those outside the circle are transcribed counterclockwise. 

doi:10.1371/journal.pone.0099016.g002 



Comparison of chloroplast genomes of seven 
photosynthetic orchids 

Six photosynthetic orchid species representing four subtribes of 
the subfamily Epidendroideae — Cymbidiinae (C mannii), Aeridi- 
nae (P. aphrodite and P. equestris), Oncidiinae (O. Grower Ramsey 
and E. pusilla), and Dendrobiinae {D. officinale) — ^were compared 
with the reference species C. macranthos in the organization and 
gene content of their cp genomes. The seven cp genomes ranged 
from 146,484 to 157,050 bp (average length = 150,307±4,889 bp) 
(Table 1). Compared with C. macranthos, the other taxa had 
reduced IR length. The organization of the cp genomes of the 



Epidendroideae was similar to that of the C. macrantlios, except for 
three sequences: 'Pycfl-ndhF, ndhC-ndh], and ndhD-ndhli. Varia- 
tions in 'PycJX-ndhY sequence were due to reductions in the lengths 
of 'Pycfl, ndhF, and 'Pycfl-ndhF non-coding regions located at the 
IRg/SSC junction. Variations in ndhC-nd/iJ and ndhD-ndhH 
sequences were caused by pseudogenization or loss of ndh genes. 

Comparison of sequences flanking IR/SC junctions in the 
Orchldaceae 

Sequences flanking IR/SC (single copy) junctions vary among 
cp genomes of different species [59]. Here, we compared 
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Figure 3. Phylogenomic tree based on 63 protein-coding genes. Only the Bl tree is shown because Bl and ML trees had identical topologies. 
Numbers near branches are posterior probabilities for Bl analysis and bootstrap values for ML analysis. The degenerate ndh genes are mapped in the 
tree. Solid, empty, and gray bars show the distribution of ndh genes in orchids, indicating intact, lost, and pseudogenized genes, respectively. 
doi:10.1371/journal.pone.0099016.g003 



sequences flanking IR/SC junctions among seven orchid cp 
genomes (Fig. 4); all of them were found to have similar structures 
at the IR/LSC junction. The tmii-rpsl9 cluster was duplicated 
and involved in IR. The IRb/LSC junction (Jlb) was located 
within rpl22 in all seven orchid cp genomes. As a result, a 
duplicated Wrpl22 was nested within IRa. 

On the other side, the orchid chloroplast genomes had distinct 
characteristics at the IR/SSC junction. In P. Aphrodite, the IRa/ 
SSC junction (Jsa) was located upstream ofjc/1, whereas in other 
species JsA was located within jc/l. Four types of junctions in the 
orchid cp genomes were characterized on the basis of the 
organization of genes flanking the IRb/SSC junction (Jsb)- 
Cypripedium and Dendrobium shared type I structure in which Jsb 
was located upstream of the ndhF-rpl32 cluster. Type II junction 
was found in Cymbidium and was characterized by an overlap 
between 'Pycfl and ndhF, resulting in Jsb being located within these 
two genes. Type III was shown in Oncidium, Erycina, and P. equestris, 
in which Jsb was located inside the 'PycJ\-rpl32 cluster, with the 
loss of nd/iF gene. The type IV structure was present in P. aphrodite 
and characterized by the entire incorporation of the entire ycj\ 
into the SSC, with Jsb inside tmN-rpl32. 

Chloroplast-encoded ndh genes in seven orchid species 

Chloroplast-encoded ndh genes were investigated in C. macrantlios 
and the six photosynthetic Epidendroideae species (Fig. 3). The 1 1 
ndh genes in Cypripedium cp genome were intact, but many ndh 
genes had either truncations or indels, resulting in frameshifts or 
pseudogenes in the six Epidendroideae cp genomes. The ndKD 
gene in all these Epidendroideae species contained indels or stop 
codons. The characteristics of other ndh genes differed among the 
genera. In Dendrobium, ndhB was intact; ndhC, I, and K were lost; 
and ndhF was truncated with two sequence inserts, creating two 
frameshifts. In the two Phalaenopsis species the ndhA and ndhF genes 
were absent and the remnants of seven ndh genes became 
pseudogenes. The ndhF genes in P. equestris and P. aphrodite were 



lost and incomplete, respectively. The two Oncidiinae species 
{Erycina and Oncidium) had similar patterns of diversity of ndh genes 
except ndhA, ndhF, and ndhl. Other varieties within Oncidiinae 
shared major characteristics of ndh genes in Erycina and Oncidium 
[29]. In Cymbidium, most of the ndh genes were present in the ORE 
and remained intact [17]. 

Sequence divergence of protein-coding genes in the 
Orchidaceae 

The pairwise distances of nucleotide and protein substitutions of 
68 protein-coding genes were compared among six orchid species 
(Table 2). According to the average pairwise distance and numbers 
of nucleotide substitutions, three genes irpsl , rpl2, and r/)/23) 
located in the IR regions had relatively low mean levels of 
sequence divergence. The rpl and rps genes in the LSC and SSC 
regions showed higher evolutionary rates. Fifteen regions with 
relatively high divergence were identified in rps\(), ycfl, mat}^, 
rpslb, rpl22, ccsA, psal, rpB2, rpllG, rpl20, atpF, psbK, psbT, accD, 
and rps8, located in the LSC, SSC, or SSC/IR junction regions. 
Similar patterns of divergence were also observed at the protein 
level, with the exception of pshT. Sequence divergence and gene 
length yielded a sufficient variety of loci (>600 bp); thus, the 
sequences of accD, ccsA, ma/K, and ycfl were identified and used 
for phylogenetic analyses. 

Molecular phylogeny within the Epidendroideae 

To determine the availability of the accF), ccsA, matK, and ycfl 
sequences for phylogenetic analyses, the Epidendroideae was used 
as a case study because of the disputes regarding its systematics. 
Sequences of the four genes were successfully amplified in all 46 
taxa. The aligned combined dataset comprised 4,593 characters, 
of which 2,839 represented variable sites and 1,447 were 
parsimony-informative sites. The number of variable sites was 
highest in ycfl and lowest in ccsA (Table S6). 
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Figure 4. Comparison of the regions flanicing the junctions (Jlb, J|.a, Jsb' srid Jsa) among seven orchid chloroplast genomes. Four types 
of junctions are present at the Jsb in seven orchid species. Numbers in green indicate the length of '/'rp/22. Numbers in orange indicate the distance 
between ndhF and Js^. Numbers in purple indicate the distance between 5'-ycf1 and Js^. Numbers in blue indicate the distance between rps19 and Jl^. 
This figure is not to scale. 
doi:1 0.1 371/journal.pone.009901 6.g004 



PLOS ONE I www.plosone.org 



8 



June 2014 | Volume 9 | Issue 6 | e99016 



Comparison and Application of Orchid Cp Genomes 




PLOS ONE I www.plosone.org 



9 



June 2014 | Volume 9 | Issue 6 | e99016 



Comparison and Application of Orchid Cp Genomes 




PLOS ONE I www.plosone.org 



10 



June 2014 | Volume 9 | Issue 6 | e99016 



Comparison and Application of Orchid Cp Genomes 



Phylogenetic analyses using BI and ML approaches resulted in 
the same topology (Fig. 5). Most nodes had high support among 
tribes and subtribes within the subfamily Epidendroideae. Within 
it, Coelogyninae was sister to all the other subtribes or tribes, with 
strong support (ML BP 100%, BI PP 1.00). The Bulhophyllum group 
clustered with the Epigeneium and Dendrobium-Flickingeria to form a 
monophyletic clade of Dendrobiinae that was closely allied to 
Malaxideae (Laparis and Oberonm). The Dendrobiinae-Malaxideae 
clade was sister to the rest of the subfamily. PodochUinae and 
Eriinae were not monophyletic clades; these two subtribes (both of 
tribe Podochfleae) were sister to Collabiinae. 

Discussion 

Comparison of RNA editing sites 

Involved in plastid posttranscriptional regulation, RNA editing 
provides an effective way to create transcript and protein diversity 
[60,61]. Some chloroplast RNA editing sites are conserved in 
higher plants [62,63]. In Orchidaceae, RNA editing sites were 
identified in 24 protein-coding transcripts in P. aphrodite [63]. 
Potential editing sites also were identified in P. equestris and 0. 
Gower Ramsey [33]. Of the examined 30 genes in above- 
mentioned seven orchids, 1 5 potential RNA editing sites out of 1 1 
genes [atpA, atpF, dpF, matK, petB, psbV, rpl20, rpoA, rpoB, rpoC 1 
andjcfSj were shared; the number of shared editing sites increased 
in Epidendroideae species (28 sites out of 16 genes) (Table S5 and 
[33]). Therefore, RNA editing is more conserved from the same 
subfamily than which from different subfamily. However, orchids 
and other angiosperms have relatively less common editing sites. 
For example, 10 potential RNA editing sites were shared by 
orchids and Cocos nucifera; comparisons among Mcotiana tabacum, 
Arabidopsis thaliana, grasses and orchid RNA editing sites showed 
low conservation of editing sites (only one common editing sites in 
?/)oB)(Table S5). These cases indicate that the evolutionary 
c<)nscr\ ation of RNA editing is essential for only a few plastid- 
editing sites [64—66]. 

IR expansion or contraction in the Orchidaceae 

The variability of genes flanking IR/SC junctions results in IR 
expansion or contraction [59,67]. At the IR/LSC boundaries, 
most IRs of non-orchid monocots contained trnll-rpsl9 gene 
clusters, excluding Wrpl22 genes, leading to more-progressive 
expansion of IRs than that having occurred in non-monocot 
angiosperms [17,20,55,59,68-71]. In all known photosynthetic 
orchid cp genomes, trnii-rpslQ clusters and 'Prpl22 genes were 
involved in IRs at the IR/LSC junctions. The IR/LSC junctions 
were the standard type III [71], and IRs experienced the largest 
expansion at the IR/LSC junction compared with other 
monocots. 

The IR/SSC junction types II-IV in ore hids diflFcred from those 
in other monocots, while type I (in Cypripedium and Dendrobium cp 
genomes) was similar to that in Acoms (Fig. 4) withjc/l extending 
over the Jsa xad^ycjl located within IR adjacent to the Jsb- 
Although Yang et al. suggested most likely evolutionary routes of 
IRs in monocots [59], no studies have proposed a model about the 
evolutionary dynamics of the IR/SSC junctions within orchids. 
Here, we hyjoothesize two evolutionary routes to explain the 
expansion or contraction of IRs adjacent to IR/SSC junctions 
from an ^coruv-like ancestor to the existing orchids. The first route 
proceeded from type I to type ll\ycj\ further expanded into the 
IRa, resulting in an expansion of duplicated 'Pycjl in the IRb. 
During this period, an overlap occurred between ndW remnant 
and fycfl. On the second route, jc/1 shifted continuously into the 
SSC, resulting in a shorter, duplicated Vycfl adjacent to the Jsb- 



Continually, jvc/1 was embedded completely into the SSC, leading 
to the loss of duplicated Vycjl. This contractive process of IR 
involved the structural change from type I to type IV via type III. 
Moreover, IRs (-xpansion or contraction may not correlate with 
the taxonomic relationships. More molecular data need to be 
collected for intensifying our understanding of variations in 
sequences flanking IR/SSC junctions. 

The shift of the border between the IR and SSC in orchids was 
associated with the ycfi gene. Compared with the average AT 
content of protein-encoding genes, all known orchid ycfi genes 
exhibited usage bias of AT base pairs (sec Tabic 1 and Table S7). 
AT base pairs are bound by two hydrogen bonds, while GC base 
pairs are bound by three hydrogen bonds; therefore, DNA with 
high AT content is less stable than that with low AT content. Poly 
(A) tract sequences at IR/LSC boundaries might be closely linked 
with the dynamics of IR/LSC junctions and expansion of IR 
[67,71]. Similarly, the AT-rich nature oiycjl gene may be linked 
to the recombination of IR/SSC junction. 

The loss or pseudogenization of ndh genes in orchid 
chloroplast genomes 

Instances of gene loss or pseudogenes have been elucidated in 
the cp genomes of monocots [21]. Chloroplast-encoded gene 
degeneration in photosynthetic orchids is mostly embodied in 
structural changes of ndh genes. There are 1 1 chloroplast-encoded 
ndh genes in the [:p genomes of land plants, located in several 
transcriptional units and encoding for the thylakoid Ndh complex 
[72]. Non-functional chloroplast-encoded ndh genes have been 
found in CAM and C3 plants [32], including gymnosperms and 
grasses [68,73,74]. Sequence truncations and indels are common 
phenomena in orchid chloroplast-encock-d ndh gf;n(\s [17,28- 
33,75]. Pseudogenization or loss of the ndh gene did not correlate 
well with the divergent patterns of Epidendroideae lineages 
observed in the phylogenetic trees (Fig. 3). However, 10 common 
ndh pseudogenes of two Phalaenopsis species showed a high degree 
of similarities in sequence and indel patterns [33]. Both Erycina and 
the allied genus Oncidium lost two ndh genes (ndhV and ndhK) and 
had six pseudogenes (ndhB, C, D, G, and J); similar results were 
obtained from Oncidium and related Oncidiinae varieties [29]. 
Thus, we infer that relative species had similar patterns of 
variation in ndh gene content. 

The loss of some chloroplast-encoded genes might not affect the 
plant fife cycle. Gene transfer from chloroplast to nucleus is known 
to occur frequentiy during evolutionary processes [76]. The 
ancestral plastid ndh genes of orchids are presumed to have been 
transferred to the nucleus [28]. Moreover, fungal symbionts may 
contribute to the fate otndh genes [77]. Therefore, the functions of 
lost chloroplast-encoded ndh genes could be performed by 
homologous genes from other resources; this hypothesis needs to 
be tested in the future study. 

Phylogenetic relationships based on complete cp 
genomes 

The cp sequences have been used in deep phylogenetic analyses 
because of their low substitution rates [20,78]. Phylogenetic 
analyses based on complete chloroplast genomes have resolved 
some bewildering relationships in angiosperms. Using two tree 
construction methods with different models, we obtained consis- 
tent results on the relationships among Phalaenopsis (Aeridinae), 
Cymbidium (Cymbidiinae), Dendrobium (Dendrobiinae), Oncidium and 
Eiydna (Oncidiinae) within Epidendroideae, which are congruent 
with mfliK and rbcL analyses by Gustafsson et al. (2010) [8] and 
morphological cladistic analysis by Freudenstein and Rasmussen 



PLOS ONE I www.plosone.org 



11 



June 2014 I Volume 9 | Issue 6 | e99016 



Comparison and Application of Orchid Cp Genomes 



1/100 I — Phalaenopsis aphrodite 



1/100 



1/95 



1/92 



1/95 



1/100 



1/100 



-//- 



1/100 



0.99/86 
1/97 



1/94 



Phalaenopsis equestris 
[choenorchis tixieri 
Gastrochiius beilinus 
Schoenorchis gemmata 



pTidS'' 
I 1 



r——/iolcoglossum flavescens 



rl 1/100 ^°^'^°Slossum amesianum 
-I I — Vanda alpina 

Renanthera imschootiana 

1/78p Rhyncholaelia glauca 

Sophronitis cemua 



Aeridinae 



'I'or- r 

1/1 nn I 



0.61/77 



1/100 



Encyclia cochleata 

0.99/82 I Cymbidium wenshanense 

Cymbidium ebumeum 
Cymbidium mannii 

Oncidiiini Grower Ramsey 

Erycina pusilla 



1/100 



1/100 



0.97/57 



1/100 



- Maxillaria variabilis 
Zygopetaium macl<ayi 



0.63/34 1 rucnua navuii 

0.72/56JI Calanthe argenteo-striata 

1/100 1 Phaius wallichii 



Phaius flavus 



0.99/87 



— I 1/100 
L- l 1/89 



Phaius wallichii 
Calanthe labrosa 
Eria striata 



Eria coronaria 
Appendicula comuta 



1/100 



1/100 



c 



Liparis bootanensis 
Liparis distans 



Obemnia ensifomiis 



Laeliinae 

Cymbidiinae 

Oncidiinae 

Maxillariinae 
Zygopetalinae 

Collabiinae 

Eriinae 
Podochilinae 

Malaxideae 



1/9 
1/100 

L-l 1/99 



k: 



Dendrobium officinale 
Dendrobium moniliforme 



1/97 



r— Dendrobium parishii 
' 1/74 



H 1/74 

_ I— Dendrobium loddigesii 
1/IOD Dendrobium aphyllum 



1/100 



D0|- 



1/100 



Flickingeria comata 



1/100 



0.95/100 



Dendrobium Idngianum 
Epigeneium amplum 
I Bulbophyllum ambrosia 

'^^^ulbophyllum andersonii 



/1 00 



— Bulbophyllum affine 
Coelogyne viscosa 

Cgelogyne fimbriata 
74/56 

^holidota longipes 



1/100 



Pholidota chinensis 



0.99/931 



Thunia alba 
Bletilla striata 



Dendrobiinae 



Coelogyninae 



1/100 



1/100 



1/100 



Anoectochilus roxburghii 
Zeuxine nervosa 



1/100 



Goodyera schlechtendaliana 

Paphiopedilum hirsutissimum 
— Paphiopedilum malipoense 



o 
■g 

Q. 

LU 



Orchidoideae 



Cypripedium macranthos 



1/100 



Phoenix dactylifera 



1/100 



Calamus caryotoides 
Typha latifolia 



Cypripedioideae 



0.03 



ONE I www.plosone.org 



12 



June 2014 I Volume 9 | Issue 6 | e99016 



Comparison and Application of Orchid Cp Genomes 



Figure 5. Phylogenetic tree of the Epidendroideae reconstructed based on combined genes. Bl and ML analyses yielded identical 
topologies. Posterior probability and bootstrap proportion are indicated near the nodes. Subfamilies, tribes, and subtribes (sensu Chase et al. [2]) are 
indicated where applicable. 
doi:l 0.1 371/journal.pone.009901 6.g005 



(1999) [12]; but are inconsistent with the analyses based on nuclear 
ribosomal internal transcribed spacer (nrlTS), matK, rbcL, tmL-F, 
the trnL intron, and nuclear Xdh gene [4,7]. However, whole- 
genome sequencing for sparse sampling can result in long-branch 
artifacts and incorrect evolutionary reconstructions [79]. There- 
fore, further genomic and taxon sampling will be necessary to 
resolve the relationships within this subfamily. 

Gene divergence based on comparative chloroplast 

genomes 

Variability of genes in cp genomes has been calculated 
according to nucleotide diversity in previous studies [80,81]. If 
we considered sequence divergence at the nucleotide and protein 
levels, rps7, rpl23, rpR, and j)cf2 were conserved with low 
evolutionary distance, with the exception of r^,s l9, which exhibited 
medium divergence in the IR regions. These results are consistent 
with previous reports of slower divergence of sequences in the IR 
regions compared to other regions [80,82]. Although the ycJZ gene 
has been demonstrated to be one of the most rapidly evolved genes 
among 16 vascular plant species [80], the present study showed 
that it had relatively slow nucleotide divergence and moderate 
protein divergence within the Orchidaceae. 

In this study, highly divergent genes were acquired according to 
pairwise distance of nucleotide substitutions. While ycjl was 
located at the IR/SSC junction, 14 other genes bordered the LSC 
and SSC regions, four of which were selected to construct 
phylogenetic trees. Of these, maiK and jcfl have been used in 
previous studies [4,6], while accD and ccsA were applied for the 
first time to the phylogenetic analysis of the subfamily Epiden- 
droideae in the present study. These genes can be used as good 
phylogenetic markers at the subfamily level because of the 
following three reasons. First, these regions are variable, which 
highlights their unusual evolutionary properties. According to the 
pairwise distance of protein substitutions (Table 2),jcfl and matK 
have high divergence, and a((D and ccsA have relatively moderate 
substitution rates that are higher than rbch, which has been used in 
previous systematic analyses within the Epidendroideae [4,5,75]. 
Second, these regions are sufficientiy long (>600 bp) to yield 
adequate loci for phylogenetic analysis. Third, the sequences are 
easily obtained by PGR amplification and relatively conservative 
for alignment. 

Phylogenetic reconstruction of the Epidendroideae 

The phylogeny of the Epidendroideae has long been debated. 
Here, eleven common subtribes and one tribe from Epidendroi- 
deae were used as a case study to identify the phylogenetic 
relationships within this subfamily using four cp scrjucnccs. Fig. SI 
illustrates the relationships among these subtribes or tribes in 
previous studies based on molecular data. With polyphyletic and 
paraphyletic groups excluded from phylogenetic analyses, major 
debates were the placement of Dendrobiinae, Malaxideae, and 
CoUabiinae, as well as identification of the basal sublrihc or tribe. 
On the basis of a concatenated data set, we clarified several 
relationships that were previously poorly resolved, and the 
majority of nodes at the subtribe level in the gene trees had high 
support. 

The placement of Coelogyninae (tribe Arethuseae) varied 
according to morphological and molecular proofs. Based on 
observing that Arethuseae has cormous and reed-stem habits. 



Dressier (1986) claimed that Arethuseae is the basal group in the 
"reed-stem" phylad [83]. Dressier (1990) divided advanced 
Epidendroideae into four major clades (Gastrodieae, Nerviieae, 
Cymbidioid phylad, and Epidendroid phylad); and placed 
Arethuseae and Dendrobioid subclade in Epidendroid phylad, 
and Maxillarieae, Cymbidieae, Malaxideae in Cymbidioid phylad 
[13]. Dressier (1993) held that Arethuseae appeared to be 
paraphyletic due to their ever-shifting boundaries and tenuous 
morphological definitions [84]. However, Van den Berg et al. 
(2005) fixed subtribe Goelogyninae in distinct positions based on 
different methods using nrlTS and four plastid sequences [4] . The 
results in this study strongly support that Coelogyninae was the 
most basal subtribe within the sampled subtribes, which was in line 
with the BI analysis by Van den Berg et al. (2005) [4], the MP 
analysis by Neubig et al. (2009) [6], and analyses by Gomiak et al. 
(2010) [7]. 

Previously, the placement of CoUabiinae and Dendrobiinae was 
problematic, but their positions ha\ e been recovered. CoUabiinae 
was polyphyletic based on matK. and rbcL [15]. Van Den Berg 
(2005) proposed CoUabiinae was in an unfixed position in MP and 
BI analyses, and Gorniak et al. (2010) posited that CoUabiinae was 
sister to Aeridinae and Eriinae with high support based on nuclear 
gene Xdh [4,7]. Our results suggest that CoUabiinae was sister to 
the PodochUinae-Eriinae (tribe PodochUeae) clade with moderate 
support; this is congruent with MP analysis of Van Den Berg 
(2005), which had weak support [4] . The positions of Dendrobii- 
nae and Malaxideae were also confirmed. Dressier (1990) placed 
Malaxideae and Dendrobieae in two separated groups, Cymbi- 
dioid phylad and Epidendroid phylad, according to reed stem, 
upper lateral inflorescences and spherical silica bodies [13]. Chase 
(2003) recognized Dendrobiinae as a subtribe rather than tribe 
Dendrobieae [2]. By inferring from nrlTS and four chloroplast 
sequences, Van den Berg et al. (2005) held that Dendrobiinae was 
beside Malaxideae [4]. Dendrobiinae is similar to Malaxideae in 
the synapomorphic state of the naked poUinium [84] . Like other 
analyses based on Xdh and rbcL, our analyses support that 
Dendrobiinae and Malaxideae were sister relatives [5,7], which 
was consistent with the morphological simUarities between them. 
Controversially, the position of Dendrobiinae-Malaxideae clade 
was going up in the analysis of Xdh [7] , and Podochileae was sister 
to this clade in the analysis of plastid gene rbch (bootstrap support 
<50%) [5]; however, this clade was sister to other clades except 
Coelogyninae with high support in the present study. More 
extensive sampling and sequencing of mitochondrial and nuclear 
genomes should be conducted to resolve uncertain relationships 
within the Epidendroideae with confidence. 

Conclusions 

In summary, complete chloroplast genomes can provide 
abundant information for resolving evolutionary questions. The 
gene content, organization, and sequence of chloroplast genome 
have been used as important markers in systematic research. This 
study determined complete cp genomes ofDendrobium qfficinak and 
Cjpripedium macranthos and compared cp genomes of seven 
photosynthetic orchids including the above two, which showed 
structural similarities but differences in IR/SSC junctions and ndh 
genes. We propose that the AT bias oiycJ\ in the Epidendroideae 
may be related to recombination of the IR/SSC junction. In 
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addition, relationships among subtribes and tribes in the subfamily 
Epidendroideae were resolved with high or moderate support in 
the present study. The highly divergent genes of cp genomes 
identified in this study can be used as markers in phylogenetic 
analyses. Further plastome sequencing of orchids will be necessary 
to clarify the diversity of chloroplast genomes and to improve our 
understanding of the relationships within this family. 
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